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Abstract. Implicit authentication consists of a server authenticating a 
user based on the user’s usage profile, instead of/in addition to relying 
on something the user explicitly knows (passwords, private keys, etc.). 
While implicit authentication makes identity theft by third parties more 
difficult, it requires the server to learn and store the user’s usage profile. 
Recently, the first privacy-preserving implicit authentication system was 
presented, in which the server does not learn the user’s profile. It uses 
an ad hoc two-party computation protocol to compare the user’s fresh 
sampled features against an encrypted stored user’s profile. The proto¬ 
col requires storing the usage profile and comparing against it using two 
different cryptosystems, one of them order-preserving; furthermore, fea¬ 
tures must be numerical. We present here a simpler protocol based on 
set intersection that has the advantages of: i) requiring only one cryp¬ 
tosystem; ii) not leaking the relative order of fresh feature samples; iii) 
being able to deal with any type of features (numerical or non-numerical). 

Keywords: Privacy-preserving implicit authentication, privacy-preserving 
set intersection, implicit authentication, active authentication, transpar¬ 
ent authentication, risk mitigation, data brokers. 


1 Introduction 

The recent report m by the U.S. Federal Trade Commission calls for trans¬ 
parency and accountability of data brokers. On the one hand, the report de¬ 
scribes the pervasive data collection carried out by data brokers as clearly 
privacy-invasive. On the other hand, it presents risk mitigation services offered 
by data brokers as the good side of data collection, to the extent that such ser¬ 
vices protect consumers against identity theft. Indeed, storing information on 
how a consumer usually interacts with a service (time of the day, usual places, 
usual sequence of keystrokes, etc.) allows using this information to implicitly 


authenticate a user: implicit authentication m ( a.k.a. transparent authentica¬ 
tion [5] or active authentication ®) is the process of comparing the user’s current 
usage profile with the stored profile. If both profiles disagree, maybe someone is 
impersonating the user, e.g. after some identity theft (password theft, etc.). 

The above risk mitigation argument is part a long-standing simplistic ten¬ 
dency in digital services (and elsewhere) to justify privacy invasion in the name 
of legitimate interests, as if the latter were incompatible with privacy (another 
old example is intellectual property protection, which was portrayed as being 
incompatible with the anonymity of digital content consumers until anonymous 
fingerprinting was proposed [1618114] j. In fact, implicit authentication turns out 
to be a weak excuse to justify the storage and/or access by servers to the us¬ 
age profiles of users. In e it was shown how to make implicit authentication 
compatible with the privacy of users. The idea is that the server only needs an 
encrypted version of the user’s usage profile. 


1.1 Contribution and plan of this paper 

The protocol in e needs the server to store the users’ accumulated usage pro¬ 
files encrypted under two different cryptosystems, one that is homomorphic and 
one that is order-preserving. We present here a protocol for privacy-preserving 
implicit authentication based on set intersection, which has the advantage that 
the server only needs to store the users’ accumulated usage profiles encrypted 
under one (homomorphic) cryptosystem. This allows saving storage at the car¬ 
rier and also computation during the protocol. Also, unlike El: our protocol 
does not leak the relative order of fresh feature samples collected by the user’s 
device for comparison with the encrypted profile. Finally, our protocol can deal 
with any type of features (numerical or non-numerical), while the protocol [17] 
is restricted to numerical features. 

The rest of this paper is organized as follows. Section [2] gives background 
on implicit authentication and on the privacy-preserving implicit authentication 
protocol of E- Section [3] discusses how to compute the dissimilarity between 
two sets depending on the type of their elements. Section |4] presents a robust 
privacy-preserving set intersection protocol that can effectively be used for im¬ 
plicit authentication. The privacy, security and complexity of the new protocol 
are analyzed in Section [5j Experimental results are reported in Section ED Fi¬ 
nally, conclusions and future research directions are summarized in Section [7] 
The Appendix gives background on privacy-preserving set intersection, recalls 
the Paillier cryptosystem and justifies the correctness of the least obvious steps 
of our protocol. 


2 Background 

We first specify the usual setting of implicit authentication and we then move 
to privacy-preserving implicit authentication. 




2.1 Implicit authentication 


The usual scenario of implicit authentication is one in which the user carries 
a mobile networked device (called just user’s device in what follows) such as a 
cell phone, tablet, notebook, etc. The user wishes to authenticate to a server 
in order to use some application. The user may (or not) use a primary pass¬ 
word authentication mechanism. To strengthen such a primary authentication 
or even to replace it, the user resorts to implicit authentication m ■ In this type 
of authentication, the history of a user’s actions on the user’s device is used to 
construct a profile for the user that consists of a set of features. In |T5] empirical 
evidence was given that the features collected from the user’s device history are 
effective to distinguish users and therefore can be used to implicitly authenti¬ 
cate them (instead or in addition to explicit authentication based on the user’s 
providing a password). 

The types of features collected on the user’s actions fall into three categories: 
(i) device data, like GPS location data, WiFi/Bluetooth connections and other 
sensor data; (ii) carrier data, such as information on cell towers seen by the 
device, or Internet access points; and (iii) cloud data, such as calendar entries. 
It is not safe to store the accumulated profile of the user in the user’s device, 
because an intruder might compromise the device and alter the stored profile in 
order to impersonate the legitimate user. Hence, for security, the profile must be 
stored by some external entity. However, the user’s profile includes potentially 
sensitive information and storing it outside the user’s device violates privacy. 

Implicit authentication systems try to mitigate the above privacy problem 
by using a third party, the carrier (i.e. the network service provider) to store the 
user’s profiles. Thus, the typical architecture consists of the user’s device, the 
carrier and the application servers. The latter want to authenticate the user and 
they collaborate with the carrier and the user’s device to do so. The user’s device 
engages in a secure two-party computation protocol with the carrier in order to 
compare the fresh usage features collected by the user’s device with the user’s 
profile stored at the carrier. The computation yields a score that is compared (by 
the carrier or by the application server) against a threshold, in order to decide 
whether the user is accepted or rejected. In any case, the application server trusts 
the score computed by the carrier. 


2.2 Privacy-preserving implicit authentication 

In the privacy-preserving implicit authentication system proposed in HU, the 
user’s device encrypts the user’s usage profile at set-up time, and forwards it 
to the carrier, who stores it for later comparison. There is no security problem 
because during normal operation the user’s device does not store the user’s 
profile (it just collects the fresh usage features). There is no privacy problem 
either, because the carrier does not see the user’s profile in the clear. 

The core of proposal m is the algorithm for computing the dissimilarity 
score between two inputs: the fresh sample provided by the user’s device and 
the profile stored at the carrier. All the computation takes place at the carrier 


and both inputs are encrypted: indeed, the carrier stores the encrypted profile 
and the user’s device sends the encrypted fresh sample to the carrier. Note that 
the keys to both encryptions are only known to the user’s device (it is the device 
who encrypted everything). 

The carrier computes a dissimilarity score at the feature level, while provably 
guaranteeing that: i) no information about the profile stored at the carrier is 
revealed to the device other than the average absolute deviation of the stored 
feature values; ii) no information about the fresh feature value provided by the 
device is revealed to the carrier other than how it is ordered with respect to the 
stored profile feature values. 

The score computation protocol in m uses two different encryption schemes: 
a homomorphic encryption scheme HE (for example, Paillier’s jl5j ) and an 
order-preserving symmetric encryption scheme OPSE (for example, [1]). For 
each feature in the accumulated user’s profile, two encrypted versions are created, 
one under HE and the other under OPSE. Similarly, for each feature in the fresh 
sample it collects, the user’s device computes two encrypted versions, under HE 
and OPSE , respectively, and sends them to the carrier. The following process 
is repeated for each feature: 

1. Using the HE ciphertexts the carrier performs some computations (additions 
and scalar multiplications) relating the encrypted fresh sampled feature value 
and the set of encrypted feature values in the stored encrypted user’s profile. 

2. The output of the previous computations is returned to the user’s device, 
which decrypts it, re-encrypts it under OPSE and returns the re-encrypted 
value to the carrier. 

3. Using the order-preserving properties, the carrier can finally compute a dis¬ 
similarity score evaluating how different is the fresh sampled feature from 
those stored in the encrypted user’s profile. This score can be roughly de¬ 
scribed as the number of feature values in the stored encrypted profile that 
are less dissimilar from the median of the stored values than the fresh sam¬ 
pled value. 

The authors of na point out that, in case of a malicious user’s device ( e.g. 
as a result of it being compromised), one cannot trust the device to provide 
the correct iTU-encrypted version of the fresh sampled feature. Nor can it be 
assumed that the device returns correct OPSE-e ncryptions in Step [2] above. 
In [17\ . a variant of the privacy-preserving implicit authentication protocol is 
presented in which the device proves the correctness of H U-encrypted fresh 
sampled features and does not need to provide OPS .E-encrypted values. This 
version is secure against malicious devices, but its complexity is substantially 
higher. 

Other shortcomings of m- 

— It is restricted to numerical features, due to the kind of computations that 
need to be performed on them. However, among the example features listed 
in Section 12.11 there are some features that are not numerical, like the list 
of cell towers or Internet access points seen by the user’s device. 


— It discloses the following information to the user’s device: i) how the fresh 
sample is ordered with respect to the stored profile feature values; ii) the 
average absolute deviation of the stored feature values. 

We present a privacy-preserving implicit authentication protocol based on 
set intersection that deals with the above shortcomings. 

3 Dissimilarity between sets depending on the data type 

Based on |2j, we recall here how the dissimilarity between two data sets A' and Y 
can be evaluated using set intersection. If we let A' be the user’s profile and Y be 
the fresh sample collected by the user’s device, our privacy-preserving implicit 
authentication setting presents the additional complication (not present in [2]) 
that A' is only available in encrypted form (the carrier stores only the encrypted 
user’s profile). Anyway, we describe here the case of two plaintext sets X and Y 
and we will deal with encrypted sets in the following sections. 

3.1 Case A: independent nominal feature values 

Assume X and Y consist of qualitative values, which are independent and binary, 
that is, the relationship between two values is equality or nothing. Take as an 
example the names of the network or phone providers seen by the user’s device, 
the operating system run by the device and/or the programs installed in the 
device. In this case, the dissimilarity between A' and Y can be evaluated as the 
multiplicative inverse of the size of the intersection of X and Y, that is l/|AnY), 
when the intersection is not empty. If it is empty, we say that the dissimilarity 
is oo. 

Clearly, the more the coincidences between A' and Y, the more similar is the 
profile stored at the carrier to the fresh sample collected by the device. 

3.2 Case B: correlated categorical feature values 

As in the previous case, we assume the feature values are expressed as qualitative 
features. However, these may not be independent. For example, if the feature 
values are the IDs of cell towers or Internet access points seen by the device, 
nearby cell towers/access points are more similar to each other than distant cell 
towers/access points. 

In this case, the dissimilarity between X and Y cannot be computed as the 
size of their intersection. 

Assume we have an integer correlation function l : E x E K > Z + that measures 
the similarity between the values in the sets of features held by the device and 
the carrier, where E is the domain where the sets of features of both players take 
values. For nominal features, semantic similarity measures can be used for this 
purpose HU; for numerical features that take values over bounded and discrete 
domains, standard arithmetic functions can be used. Assume further that both 
the device and the carrier know this function s from the very beginning. 


Here the dissimilarity between the set X and the set Y can be computed as 
V(Sxex v)) 


when the denominator is nonzero. If it is zero, we say that the distance is oo. 


3.3 Case C: numerical feature values 

In this case, we want to compute the dissimilarity between two sets of numerical 
values based on set intersection. Numerical features in implicit authentication 
may include GPS location data, other sensor data, etc. Assume U = {«i, • • • , u t } 
and V = {ui, • • • ,Vt}. A way to measure the dissimilarity between X and Y is 
to compute K ~ ^1- 

4 Robust privacy-preserving set intersection for implicit 
authentication 

It will be shown further below that computing dissimilarities in the above three 
cases A, B and C can be reduced to computing the cardinality of set intersections. 
Furthermore, this can be done without the carrier revealing A' and without the 
user’s device revealing Y , as required in the implicit authentication setting. The 
idea is that, if the dissimilarity stays below a certain threshold, the user is 
authenticated; otherwise, authentication is refused. 

In Appendix [A] we give some background on privacy-preserving set intersec¬ 
tion protocols in the literature. Unfortunately, all of them assume an honest-but- 
curious situation, but we need a privacy-preserving set intersection protocol that 
works even if the adversary is a malicious one: notice that the user’s device may 
be corrupted, that is, in control of some adversary. Hence we proceed to specify¬ 
ing a set intersection protocol that remains robust in the malicious scenario and 
we apply it to achieving privacy-preserving implicit authentication in Case A. 
We then extend it to Cases B and C. We make use of Paillier’s cryptosystem [15] , 
which is recalled in Appendix [B] 


4.1 Implicit authentication in Case A 

Set-up Let the plaintext user’s profile be (ai, • • • , a s ). In this phase, the user’s 
device transfers the encrypted user’s profile to the carrier. To do so, the user’s 
device does: 

1. Generate the Paillier cryptosystem with public key pk = ( n,g ) and secret 
key sk. 

2. Compute the polynomial p{x) = a i) = Po +P 1 X + P 2 X 1 2 3 + ■ ■ ■ +p s x s . 

3. Compute Enc(po ), • • • Enc(p s ) where Enc{pi) = g Pi r " mod n. 


4. Randomly choose R! £ Z n 2 . Find r' 0 , ■ ■ ■ , r' s £ Z n 2 such that 

R'= r' 0 ■ r[ aj ■ r' 2 aj ■ ■ ■ r' s aj mod n 2 , j = (1) 

Note that the system ft]) has a trivial solution r' 0 = R' and r[ = ■ ■ ■ = r' s = 1, 
but, since it is underdetermined (s + 1 unknowns and s equations), it has 
many non-trivial solutions too (see correctness analysis in Appendix [Cj. 

5. Compute Ri = r'Jrt mod nr. Randomly choose integer d £ Z n . Send 

pk , Enc(po ), • • • Enc(p s )' Ro d , • • • , R s d mod n 2 

to the carrier. Locally delete all data computed during the set-up protocol, 
but keep ( d , R') secretly. 

Implicit authentication protocol As discussed in Section 13.11 in case of 
independent nominal feature values (Case A), dissimilarity is computed as l/|An 
Y |. Hence, to perform implicit authentication the carrier just needs to compute 
the cardinality of the intersection between the fresh sample collected by the 
user’s device and the user’s profile stored at the carrier. The challenge is that 
the carrier only holds the encrypted user’s profile and the user’s device does no 
longer hold the plaintext user’s profile either in plaintext or ciphertext. 

Let y = {bi, ■ ■ ■ , bt} C E be the fresh sample collected by the user’s device. 
Then the device and the carrier engage in the following protocol: 

Step 1 The carrier randomly chooses 9 , and sends pk, Enc(pf ) s , ■ ■ ■ Enc(p s ) e ; 
R 0 d , • • •, R s d to the user’s device. 

Step 2 The user’s device picks a random integer r(j) £ Z „2 for every 1 < j < t. 
The device computes for 1 < j < t 

Enc(r(j ) ■ d - 9 ■ p(bj)) = Enc(p(bj)) d ' e ' r ^ 

= ( Enc(p 0 ) • • • Enc(p s ) b1 i) d ' e ' r< ' :, ' ) 

= mod n 2 

where 7 j = (ro • ri bj ■ r 2 bj ■ ■ ■ r b i) r ^ mod n 2 . The user’s device then com¬ 
putes Tj = (R 0 ■ Ri bj ■ R 2 bi ■ • • R s b i ) dr (- ? ) mod n 2 . For all j, the device ran¬ 
domly orders and sends 

{(Enc(r(j)-d-9-p(b 0 )),T 3 ,R’ rm )} (2) 

to the carrier. 

Step 3 For 1 < j < t, the carrier does: 

— Compute Enc(r(j) ■ d - 9 ■ p(bj)) ■ T" e ; 

— From Expression if bj = ai for some i £ { 1 , - • - , s}, then p(bj) = 0 
and hence Enc(r(j)d ■ 9 ■ p(bj)) ■ Tf e = ]{ ir ^' ,dne - note that the carrier 

can recognize ^' ,dne by raising R ,r ^ d received in Expression ([2j) to 
nO. Otherwise (if bj at for all * £ {1, ■ • • , s}) Enc(r(j) ■ d ■ 9 ■ p(bj)) 
looks random. See correctness analysis in Appendix [C] 

If both parties are honest, then the carrier learns |A' 0 Y\ but obtains no 
information about the elements in X or Y. 


4.2 Implicit authentication in Case B 

Here, the carrier inputs X and the user’s device inputs Y. two sets of features, 
and they want to know how close X and Y are without revealing their own set. 
In the protocol below, only the carrier learns how close X and Y are. 

We assume that the domain of X and Y is the same, and we call it E. The 
closeness or similarity between elements is computed by means of a function s. 
In particular, we consider functions l : E x E —>• Z+. Observe that Case A is a 
particular instance of this Case B in which l(x, x) = 1 and l(x, y) = 0 for x y y. 

Let Y be the input of the user’s device. For every z £ E, the device computes 
£~ = Y2y£Y K z > y)- Observe that £ z measures the overall similarity of 2 and Y. 
Let Y' = {z £ E : l z > 0}. It is common to consider functions satisfying 
l(z, z) > 0 for every z £ E, and so in general Y C Y'. 

An implicit authentication protocol for such a computation can be obtained 
from the protocol in Case A ('Section 14.Ill , by replacing Steps 2 and 3 there with 
the following ones: 

Step 2’ For every z £ Y', the user’s device picks £ z random integers r(l), ■ • ■ , 
r(£ z )£Z n 2 and for 1 < j < £ z does 
— Compute 

Enc(r{j) ■ d ■ 9 -p{z)) = Enc(p(z)) de ' r ^ 

= (. Enc(p 0 ) • • • Enc{jp s ) zS ) d ' e " r( ' i) 

= g rU)-d-6p(z)^n-d-e mod n 2 

where 7 j = (ro ■ ry 2 • r 2 z2 ■ ■ ■ r s z ‘) r ^ mod n 2 . 

— Compute Yj = ( R 0 ■ Ri z ■ R 2 Z ■ ■ ■ R s z lnoc i n 2 . 

- Let Ej = {(. Enc{r(j) ■ d ■ 9 • p(z)), T j ,R' r(j)d )}. 

Finally, the user’s device randomly re-orders the sequence of all computed 
Ej for all 2 £ Y' (a total of J2zev ^ elements) and sends the randomly 
re-ordered sequence of Ej ’s to the carrier. 

Step 3’ For every received Ej , the carrier does 
— Compute Enc(r(j)dd ■ p(z)) ■ T™ 6 ; 

— From Expression 0. if z £ X, then p(z) = 0 and hence Enc(r(j)d ■ 9 ■ 
p(z))-Yj ,e = R n i -Y dne ( see correctness analysis in AppendixlGl): otherwise 
(if z £ X) Enc(r(j)d6 ■ p(z)) looks random. 

Hence, at the end of the protocol, the total number of Ej which yield Ji' r ^' )dn6 

xeX xGX y£Y 

that is, the sum of similarities between the elements of X and Y. This clearly 
measures how similar X and Y are. At the end of the protocol, the carrier knows 
\Y’\ and the device knows |AT|. Besides that, neither the carrier nor the device can 
gain any additional knowledge on the elements of each other’s set of preferences. 


4.3 Implicit authentication in Case C 

Let the plaintext user’s profile be a set U of t numerical features, which we 
denote by U = {u\, ■ ■ ■ ,Ut}. The device’s fresh sample corresponding to those 
features is V = {r>i, • • • ,Vt}. The carrier wants to learn how close X and Y are, 
that is, K _ v i\- 

Define X = {( i,j ) : m > 0 and 1 < j < m } and Y = {( i,j ) : Vi > 0 and 1 < 
j < Vi}. Now, take the set-up protocol defined in Section mi for Case A and run 
it by using X as plaintext user profile. Then take the implicit authentication 
protocol for Case A and run it by using Y as the fresh sample input by the 
device. In this way, the carrier can compute \X n Y\. Observe that 

\XHY\ = \{{i,j) : Ui,Vi > 0 and 1 < j < min{^,^}}| = ^ min{^,^}. 

1 <i<t 

In the set-up protocol for Case A, the carrier learns |X| and during the implicit 
authentication protocol for Case A, the carrier learns |F|. Hence, the carrier can 
compute 


t t 

\X\ + |y| - 2\X CY\= y^(ma x{ui,Vj} + min{^,^}) - 2^min{^,^} 

i— 1 i=1 

t t 

= y^(max{uj,Uj} - min{uj,Wi}) = ^ | Ui - v t \ 

i—l i—1 

5 Privacy, security and complexity 

Unless otherwise stated, the assessment in this section will focus on the protocols 
of Case A ( Section |4. II) . the protocols of Cases B and C being extensions of Case 

A. 

5.1 Privacy and security 

We define privacy in the following two senses: 

— After the set-up is concluded, the user’s device does not keep any information 
about the user’s profile sent to the carrier. Hence, compromise of the user’s 
device does not result in compromise of the user’s profile. 

— The carrier learns nothing about the plaintext user’s profile, except its size. 
This allows the user to preserve the privacy of her profile towards the carrier. 

Lemma 1. After set-up, the user’s device does not keep any information on the 
user’s profile sent to the carrier. 

Proof. The user’s device only keeps (d, R') at the end of the set-up protocol. 
Both d and R 1 are random and hence unrelated to the user’s profile. □. 


Lemma 2. The carrier or any eavesdropper learn nothing about the plaintext 
user’s profile, except its size. 

Proof. After set-up, the carrier receives pk, Enc(po), ■ ■ ■ Enc(p s );Ro d , ••• , 
R s d mod n 2 . Since d is random and unknown to the carrier, Ro d ,--- ,R s d 
mod n 2 look random to the carrier and will give him no more information about 
the plaintext user’s profile than the Paillier ciphertexts Enc(po), ■ ■ ■ Enc(p s ). 
That is, the carrier learns nothing about the user’s plaintext profile X = {a\, 
• • • , a s } except its size s. The same holds true for an eavesdropper listening to 
the communication between the user’s device and the carrier during set-up. 

At Step 2 of implicit authentication, the carrier only gets the fresh sample 
Y encrypted under Paillier and randomly re-ordered. Hence, the carrier learns 
no information on Y, except its size t. At Step 3, the carrier learns \X fl Y |, but 
not knowing Y , the size |X n Y\ of the intersection leaks to him no information 
on X. □ 

If we define security of implicit authentication as the inability of a dishonest 
user’s device to disrupt the authentication outcome, we can state the following 
result. 

Lemma 3. A dishonest user’s device has no better strategy to alter the outcome 
of implicit authentication than trying to randomly guess the user’s profile. 

Proof. At the end of the set-up protocol, the (still uncompromised) user’s 
keeps no information about the user’s profile (Lemma [TJ. Hence, if the user’s 
device is later compromised and/or behaves dishonestly, it still has no clue on 
the real user’s profile against which its fresh sample is going to be authenticated. 
Hence, either the user’s device provides an honest fresh sample and implicit au¬ 
thentication will be correctly performed, or the user’s device provides a random 
fresh sample with the hope that it matches the user’s profile. □ 

5.2 Complexity 

Case A During the set-up protocol, the user’s device needs to compute: 

— s + 1 Paillier encryptions for the polynomial coefficients; 

— values r' 0 , ■ ■ ■ , r'; as explained in Appendix [Cj this can be done by randomly 

choosing 7’g, then solving an s x s generalized Vandermonde system (doable 

in 0(s 2 ) time using [?]) and finally computing s modular powers to find the 

^*1 ) 1 ^sl 

— s + 1 modular powers (raising the Ri values to d). 

During the implicit authentication protocol, the user’s device needs to com¬ 
pute (Step 2): 

— t Paillier encryptions; 

— ts modular powers (to compute the X) values); 

— t modular powers (to raise R' to r(j)d). 


Also during the implicit authentication protocol, the carrier needs to com¬ 
pute: 

— At Step 1, s + 1 modular powers (to raise the encrypted polynomial coeffi¬ 
cients to 0); 

— At Step 3, t Paillier encryptions; 

— At Step 3, t modular powers (to raise the T) values to nO). 


Case B The set-up protocol does not change w.r.t. Case A. In the implicit 
authentication protocol, the highest complexity occurs when Y' = E and the 
similarity function l always takes the maximum value in its range, say L. In this 
case, 

iz = H H KaI/) = \E\sL. 

z&Y' zeY 1 y£Y 

Hence, in the worst case the user’s device needs to compute (Step 2’): 

— \E\sL Paillier encryptions; 

— \E\sL modular powers (to compute the Tj values); 

— \E\sL modular powers (to raise R! to r(j)d). 

Also during the implicit authentication protocol, the carrier needs to com¬ 
pute: 

— At Step 1, s + 1 modular powers (to raise the encrypted polynomial coeffi¬ 
cients to 9 ); 

— At Step 3’, \E\sL Paillier encryptions; 

— At Step 3’, \E\sL modular powers (to raise the Tj values to n6). 

Note that the above complexity can be reduced by reducing the range of the 
similarity function l 

Case C Case C is analogous to Case A but the sets X and Y whose inter¬ 
section is computed no longer have s and t elements, respectively. According to 
Section m the maximum value for |X| occurs when all u t take the maximum 
value of their range, say, M, in which case X contains tM pairs (*, j). By a 
similar argument, Y also contains at most tM pairs. 

Hence, the worst-case complexity for Case C is obtained by performing the 
corresponding changes in the assessment of Case A. Specifically, during the set¬ 
up protocol, the user’s device needs to compute: 

— tM + 1 Paillier encryptions for the polynomial coefficients; 

— Solve a Vandermonde system tM x tM (doable in 0((tM) 2 ) time) and then 
compute tM modular powers to find the r[ values; 

— Compute tM + 1 modular powers (raising the Ri values to d ). 

During the implicit authentication protocol, the user’s device needs to com¬ 
pute (Step 2): 


— tM Paillier encryptions; 

— t 2 M 2 modular powers (to compute the Tj values); 

— tM modular powers (to raise R' to r(j)d). 

Also during the implicit authentication protocol, the carrier needs to com¬ 
pute: 

— At Step 1, tM + 1 modular powers (to raise the encrypted polynomial coef¬ 
ficients to 0); 

— At Step 3, tM Paillier encryptions; 

— At Step 3, tM modular powers (to raise the Tj values to nO). 

Note that the above complexities can be reduced by reducing the range of 
the numerical values in sets U and V. 

6 Experimental results 

As stated in the previous section, the complexity of our implicit authentication 
protocol ultimately depends on the sizes of the input sets. In Case A, the sizes 
of the sets are directly given by the user inputs; in Case B, these sizes are the 
product of the size of the input sets times the range of the similarity function 
£; and in Case C, the sizes are given by the size of the original sets times the 
range of their values. We ran an experiment to test the execution times of our 
protocol, based on Case A, to which the other two cases can be reduced. 

The experiment was implemented in Sage-6.4.1 and run on a Debian7.7 ma¬ 
chine with a 64-bit architecture, an Intel i7 processor and 8GB of physical mem¬ 
ory. We instantiated a Paillier cryptosystem with a 1024-bit long n, and the 
features of preference sets were taken from the integers in the range [1... 2 128 ]. 
The input sets ranged from size 1 to 50, and we took feature sets of the same 
size to execute the set-up and the authentication protocols. 

Step 4 of the set-up protocol (Section mi) , in which a system of equations is 
solved for r\ for 1 < i < s, is the most expensive part of the set-up protocol. As 
a worst-case setting, we used straightforward Gaussian elimination which takes 
time 0(s 3 ), although, as mentioned above, specific methods like |7j exist for 
generalized Vandermonde matrices that can run in 0(s 2 ) (such specific methods 
could be leveraged in case of smartphones with low computational power). On 
the other hand, Step 2 of the authentication protocol ("Section 14.11) . computed 
by the user’s device, is easily parallelizable for each feature in the sample set. 
Since parallelization can be exploited by most of the current smartphones in the 
market, we also exploited it in our experiment. The results are shown in Table [T| 
(times are in seconds). 

Note that the set-up protocol is run only once (actually, maybe once in 
a while), so it is not time-critical. However, the authentication protocol is to 
be run at every authentication attempt by the user. For example, if a user 
implicitly authenticates herself using the pattern of her 20 most visited websites, 
authentication with our proposal would take 3.37 seconds, which is perfectly 
acceptable in practice. 


Table 1 . Execution times (in seconds) for different input set sizes 



1 

5 

10 

15 

20 

25 

30 

35 

40 

45 

50 

Set-up 

0.89 

0.79 

1.1 

1.83 

4.67 

11.45 

24.65 

47.6 

84.99 

144.81 

228.6 

Authentication 

0.08 

0.47 

1.05 

2.0 

3.37 

5.4 

8.27 

12.13 

17.3 

23.39 

31.2 


7 Conclusions and future research 

To the best of our knowledge, we have presented the second privacy-preserving 
implicit authentication system in the literature (the first one was .TTj). The 
advantages of our proposal with respect to m are: 

— The carrier only needs to store the user’s profile encrypted under one cryp¬ 
tosystem, namely Paillier’s. 

— Dishonest behavior or compromise at the user’s device after the initial set¬ 
up stage neither compromises the privacy of the user’s profile nor affects the 
security of authentication. 

— Our proposal is not restricted to numerical features, but can deal also with 
all sorts of categorical features. 

— In case of numerical or categorical ordinal features, our proposal does not 
disclose how the fresh sample is ordered with respect to the feature values 
in the stored user’s profile. 

For binary or independent nominal features, the complexity of our proposal 
is quite low (quadratic in the number of values in the user’s profile). For corre¬ 
lated categorical feature values, the complexity is higher, but it can be reduced 
by decreasing the range of the similarity function used. Finally, in the case of 
numerical values, the complexity is also higher than in the binary/independent 
nominal case, but it can be reduced by decreasing the range of the numerical 
feature values. 

Future research will include devising ways to further decrease the computa¬ 
tional complexity in all cases. 
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A Background on privacy-preserving set intersection 


Secure multiparty computation (MPC) allows a set of parties to compute func¬ 
tions of their inputs in a secure way without requiring a trusted third party. 
During the execution of the protocol, the parties do not learn anything about 
each other’s input except what is implied by the output itself. There are two 
main adversarial models: honest-but-curious adversaries and malicious adver¬ 
saries. In the former model, the parties follow the protocol instructions but they 
try to obtain information about the inputs of other parties from the messages 
they receive. In the latter model, the adversary may deviate from the protocol 
in an arbitrary way. 

We will restrict here to a two-party setting in which the input of each party 
is a set, and the desired output is the cardinality of the intersection of both 
sets. The intersection of two sets can be obtained by using generic constructions 
based on Yao’s garbled circuit m- This technique allows computing any arith¬ 
metic function, but for most of the functions it is inefficient. Many of the recent 
works on two-party computation are focused on improving the efficiency of these 
protocols for particular families of functions. 

Freedman, Nissirn, and Pinkas [5] presented a more efficient method to com¬ 
pute the set intersection, a private matching scheme , that is secure in the honest- 
but-curious model. A private matching scheme is a protocol between a client C 
and a server S in which C’s input is a set X of size ic , iS’s input is a set Y of size 
is, and at the end of the protocol C learns In Y. The scheme uses polynomial- 
based techniques and homomorphic encryption schemes. Several variations of 
the private matching scheme were also presented in [9j : an extension to the ma¬ 
licious adversary model, an extension of the multi-party case, and schemes to 
compute the cardinality of the set intersection and other functions. Constructing 
efficient schemes for set operations is an important topic in MPC and has been 
studied in many other contributions. Several works such as |3l6illll3llf)j present 
new protocols to compute the set intersection cardinality. 


B Paillier’s cryptosystem 

In this cryptosystem, the public key consists of an integer n (product of two 
RSA primes), and an integer g of order n modulo n 2 , for example, g = 1 + n. 
The secret key is where </>(•) is Euler’s totient function. 

Encryption of a plaintext integer to, with m < n involves selecting a random 
integer r < n and computing the ciphertext c as 

c = Enc(m) = g m ■ r n mod n 2 = (1 + mn)r n mod n 2 . 

Decryption consists of first computing ci = c^ n ' > mod n 2 = 1 + m(f>(n)n mod n 2 
and then to = (ci — 1 )</>(n) _1 mod n 2 . 

The homomorphic properties of Paillier’s cryptosystem are as follows: 



— Homomorphic addition of plaintexts. The product of two ciphertexts de¬ 
crypts as the sum of their corresponding plaintexts: 

D(E(mi,ri) • E(m2,r2) mod n 2 ) = mi + m 2 mod n. 

Also, the product of a ciphertext times g raised to a plaintext decrypts as 
the sum of the corresponding plaintexts: 

D(E(mi,r\) ■ g m2 mod n 2 ) = mi + m 2 mod n. 

— Homomorphic multiplication of plaintexts. An encrypted plaintext raised to 
the power of another plaintext will decrypt to the product of the two plain¬ 
texts: 

D(E(mi,ri) m2 mod n 2 ) = D(E(mi,ri) m2 mod n 2 ) = miin.2 mod n. 
More generally, given a constant k 1 D{E(rn\,r\) k mod n 2 ) = km\ mod n. 


C Correctness 

In general, the correctness of our protocol follows from direct algebraic verifi¬ 
cation using the properties of Paillier’s cryptosystem. We go next through the 
least obvious steps. 


C.l Set-up protocol 

In the set-up protocol, r' 0 , ■ ■ ■ , r' are found as a solution of the following system 


R! 


<3 t-H 

*v 

^ 0 

,a\ 

T 2 ■ 

• r'“ x mod n 2 

R' 


/ f a s 

L r o • r 1 

r l°- 2 s 

■ ' 2 ' 

■ r '“ 3 mod n 2 


The above system has s + 1 unknowns and s equations. Therefore it has one 
degree of freedom. To avoid the trivial solution r' 0 = R' and r[ = ■ ■ ■ = r' = 1, 
we choose a random r' 0 . Then we divide the system by r' 0 and we take logarithms 
to get 


'log (R'/r' o y 
log (R'/r' 0 ) 

mod n = 

ai a 2 ■ 

■ < 


’logr)' 
logr) 

log (R'/r'o) 


_a s a 2 ■ 

■ <. 


. log r 's _ 


The matrix on the right-hand side of the above system is an s x s generalized 
Vandermonde matrix (not quite a Vandermonde matrix). Hence, using the tech¬ 
niques in [7] it can be solved in 0(s 2 ) time for logr), • • •, logr). Then s powers 
modulo n 2 need to be computed to turn logr' into r\ for i = 0, • • • , s. 












C.2 Implicit authentication protocol 


We specify in more detail the following derivation in Step 2 of the implicit 
authentication protocol of Section Id. II 

Enc(r(j ) • d ■ 6 ■ p(bj )) = Enc(p(bj )) d ' e ' r ^ mod n 2 

= ( Enc(po ) • • • Enc(p s ) b j ) d ' e ' r ^ mod n 2 
= ( g Po r£ ■ • • (gP°r™) bS i) d " e - r ( j '> mod n 2 
= {g p ^) d - 6 - r ( j \r 0 ■ r\ j ■ ■■r }) r mod „2 
= g r U)- d -Mbi)^n-d-e mod n 2_ 

Regarding Step 3 of the implicit authentication protocol, we detail the case 
bj = di for some i £ {1, • • ■ , s}. In this case, p(bj) = 0 and hence 

Enc{r(j)d6 ■ p(bj)) ■ mod n 2 = Enc(0) r ^ de ■ T™ e mod n 2 

= (r 0 • r\ j ■ ■ ■ r} ) nr ^ de • Tf mod n 2 
= (r 0 ■ r\ j ■ ■ ■ r b s * ) nr ^ de ■ (R 0 - R\ j •.. R b j> )Mj)n6 mo d n 2 

= {r'a-r’l* ■■■r ,a }) r(J)dne mod n 2 = R ,r{j)dnB mod n 2 . (3) 

If in Step 3, if we have bj ^ a,i for all i £ {1, - - • , s}, then Derivation ([3]) 
does not hold and a random number is obtained instead. On the one side, the 
powers of g does not disappear from Enc(r(j)d6 ■p(bj )). On the other side, the 
exponents bj, - ■ ■ ,bj cannot be changed by a,, • • • , af as done in the last step of 
Derivation ([3]). Hence, a random number different from R ,r U) dn6 j s obtained. 


